Modeling Additive Structure and Detecting Interactions with Groves of Trees

نویسنده

  • Daria Sorokina
چکیده

Discovery of additive structure is an important step towards understanding a complex multi-dimensional function, because it allows for expressing this function as the sum of lower-dimensional or otherwise simpler components. Modeling additive structure also opens up opportunities for learning better regression models. The term statistical interaction is used to describe the presence of non-additive effects among two or more variables in a function. When variables interact, their effects must be modeled and interpreted simultaneously. Thus, detecting statistical interactions can be critical for an understanding of processes by domain researchers. This dissertation analyzes benefits of modelling additive structure for prediction and interaction detection problems. It describes a new learning algorithm called Groves, which is an ensemble of additive regression trees. Groves is based on such existing techniques as bagging and additive models; their combination allows us to use large trees in the ensemble and at the same time model additive structure of the response function. Regression version of the algorithm, Additive Groves, and its classification counterpart, Gradient Groves, yield consistently high performance across a variety of problems, outperforming on average a large number of other algorithms. Additive nature of Groves makes it particularly useful for interaction detection. This dissertation introduces a new approach to interaction detection: it is based on comparing the performance of restricted and unrestricted predictive models. Groves of trees allow variable interactions to be carefully controlled and therefore are especially useful for this framework. The details of proposed practical approach to interaction detection analysis are demonstrated on real data describing the abundance of different species of birds in the prairies east of the southern Rocky Mountains. worked for a year as a junior researcher at Russian Academy of Sciences. Daria spent four years in Cornell graduate school where she did research on statistical interactions and ornithology applications with Rich Caruana and Mirek Riedewald, briefly interrupted by internships in Fraunhofer IPSI and Google Pittsburgh. Daria defines her research interests as a blurry area between data mining and machine learning. iii ACKNOWLEDGEMENTS This work would not be possible without the help and influence of many people I met during my study in Cornell. First of all I would like to thank my advisor, Rich Caruana. His experience and support were invaluable for me. I have learned a lot from him both about machine learning and about how to do research. Many ideas that later formed this thesis emerged in discussions …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Additive Groves of Regression Trees

We present a new regression algorithm called Additive Groves and show empirically that it is superior in performance to a number of other established regression methods. A single Grove is an additive model containing a small number of large trees. Trees added to a Grove are trained on the residual error of other trees already in the model. We begin the training process with a single small tree ...

متن کامل

Additive Groves in LTRC Application of Additive Groves to the Learning to Rank Challenge

This paper describes a submission of team AG to the Yahoo! Learning to Rank Challenge held in 2010. This solution has scored 4th place in the main track. The primary algorithm used is Additive Groves of regression trees. 1. Competition and Data Yahoo! Labs organized the first Learning to Rank Challenge in spring 2010. The challenge ran from March 1 to May 31 and received 4, 736 submissions from...

متن کامل

The role of trees as a natural index in post-disaster reconstruction (Case Study: Palm groves of Bam, Following the 2003 Bam earthquake)

Background & objective: Trees, as an influential element, have an important role in post disaster reconstruction in four aspects; they can be used as "temporary settlement materials", "reviving collective memories", "creating calm” and “motivation for reconstruction". In addition, as "living memorials”, they remind the disaster and indicate the necessity of preparedness and resilience of societ...

متن کامل

Application of Additive Groves to the Learning to Rank Challenge

This is a description of the team AG submission to the Learning to Rank Challenge. This solution has scored 4th place in the main track. The primary algorithm used is Additive Groves of regression trees.

متن کامل

Evaluation of Palm Groves Technical Efficiency Using Bootstrap Data Envelopment Analysis: A Case Study of Roodkhanehbar Area, Iran

Roodkhnehbar area, having approximately 111 thousands of Keriteh palm trees, is one of the most important areas of date production in the Rudan County[1]and the source of peoples’ income in this area, directly or indirectly. As a result, its production efficiency has a critical importance to the orchardists in this region. This study aims to evaluate technical efficiency of palm groves in this ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008